2025 11 03 Api Enrichment Refactoring Design

EPGOAT Documentation - Work In Progress

API Enrichment Service Refactoring Design

Date: 2025-11-03 Last Updated: 2025-11-09 Status: Approved - Ready for Implementation Design Session: Phase 2 Code Review - Critical File Analysis Target File: backend/epgoat/services/api_enrichment.py


Executive Summary

Refactor the 2068-line backend/epgoat/services/api_enrichment.py God class into 14 focused modules using Chain of Responsibility and Observer patterns. The current implementation violates Single Responsibility Principle with 10+ responsibilities and an 800-line function. The refactoring splits the monolith into 3 clear layers: Pipeline Orchestration, Handler Chain, and Support Services.

Key Metrics: - Current: 1 file, 2068 lines, 10+ responsibilities, 800-line function - Target: 14 files, ~1900 lines total, single responsibility per file, <300 lines per file - Benefits: 16x reduction in max function size, 100% SOLID compliance, isolated testability


Context

Current Problem

The EPGEnricher class (backend/epgoat/services/api_enrichment.py) is a God class with critical violations:

SOLID Violations: - Single Responsibility: Handles 10+ responsibilities (API enrichment, team parsing, cache management, regex matching, league inference, sport detection, FloSports mapping, time extraction, cost tracking, statistics learning) - Open/Closed: Modification required to add new cache layers or enrichment strategies - Function Length: enrich_event() is 800 lines (16x over 50-line standard)

Type Safety Issues: - Missing type hints on __init__ parameters - Bare dict instead of dict[str, Any]

Error Handling Issues: - Overly broad except Exception catches (3 violations) - Should use specific exception types

Complexity Issues: - Deep nesting (6+ levels) - Cyclomatic complexity >20 - Multiple loops over candidate leagues

Current Architecture

EPGEnricher (2068 lines)
  β”œβ”€ API enrichment orchestration
  β”œβ”€ Team parsing (parse_teams_from_payload, _clean_team_name)
  β”œβ”€ League inference (multi-strategy)
  β”œβ”€ Sport type detection (guess_sport_type_from_channel)
  β”œβ”€ 4-layer caching (Enhanced, Details, CrossProvider, Local DB)
  β”œβ”€ Regex matching (integration)
  β”œβ”€ FloSports mapping (extract_flosports_subcategory, map_flosports_to_league)
  β”œβ”€ Time extraction (get_event_times, _is_time_tba)
  β”œβ”€ Cost tracking (integration)
  └─ Statistics learning (family β†’ league patterns)

Design Decisions

Decision 1: Breaking Changes Allowed

Context: Need to refactor 2068-line God class violating SOLID principles.

Decision: Allow breaking changes to public API for clean architecture.

Rationale: - Current API is poorly designed (too many constructor parameters) - Breaking changes enable true Single Responsibility - Clean dependency injection requires new interface - Backward compatibility would force compromises

Consequences: - βœ… Maximum flexibility for SOLID refactoring - βœ… Clean architecture patterns possible - βœ… Isolated component testing - ⚠️ Callers must update to new API (documented migration path provided)

Decision 2: Chain of Responsibility for Enrichment Pipeline

Context: Sequential enrichment strategy with fallback mechanisms (cache β†’ regex β†’ DB β†’ API).

Decision: Use Chain of Responsibility pattern with 7 handlers.

Rationale: - Natural fit for sequential fallback logic - Each handler is independent and testable - Easy to add/remove/reorder strategies - Clear separation of concerns - Handlers can fail gracefully without breaking chain

Alternatives Considered: 1. Strategy Pattern with Coordinator: Would create complex coordinator logic 2. Layered Service Architecture: Tighter coupling between layers

Consequences: - βœ… Easy to add new enrichment strategies - βœ… Easy to reorder priority - βœ… Each handler <100 lines, highly focused - βœ… Isolated testing per handler - ⚠️ More classes to manage (7 handlers vs 1 monolith)

Decision 3: Separate Service Classes for Support Logic

Context: Multiple distinct responsibilities (team parsing, league inference, sport detection, etc.).

Decision: Create 6 focused service classes, each <300 lines.

Classes: 1. TeamParser: Team extraction and cleaning 2. LeagueInferencer: Multi-strategy league inference 3. SportTypeDetector: Sport detection and emoji mapping 4. FloSportsMapper: FloSports subcategory mapping 5. TimeExtractor: Event time parsing and TBA detection 6. EventEnrichmentBuilder: Enrichment dictionary construction

Rationale: - True Single Responsibility Principle - Each service highly testable in isolation - Clear interfaces and dependencies - Easy to mock for testing - Reusable across different enrichment strategies

Consequences: - βœ… 100% SRP compliance - βœ… <300 lines per service - βœ… Isolated unit testing - βœ… Clear dependency graph - ⚠️ More files to navigate (6 services vs inline methods)

Decision 4: Rich Context Object (Mutable)

Context: Need to pass data through handler chain, accumulating parsed information.

Decision: Use mutable EnrichmentContext dataclass that flows through chain.

Rationale: - Pragmatic Python approach (vs pure functional) - Clear visibility into pipeline progression - Easy to debug (inspect context at any stage) - Accumulates parsed data from services - Single object to pass around

Alternatives Considered: 1. Immutable Request/Response: Creates object overhead on each handler 2. Separate Request + Accumulator: Two objects to pass around

Consequences: - βœ… All data in one place - βœ… Easy to debug - βœ… Clear progression through pipeline - ⚠️ Mutable state (acceptable tradeoff in Python)

Decision 5: Observer Pattern for Analytics

Context: Cross-cutting concerns (MatchDebugLogger, CostTracker, FamilyStatsTracker) need to observe enrichment without coupling.

Decision: Use Observer pattern with pluggable observers.

Rationale: - True separation of concerns - Easy to enable/disable analytics - No coupling between handlers and analytics - Observers can be added without changing handlers - Single responsibility maintained

Alternatives Considered: 1. Inject Into Each Handler: Bloated handler signatures, tight coupling 2. Context-Based Tracking: Mixes concerns in context object

Consequences: - βœ… Zero coupling between handlers and analytics - βœ… Easy to add/remove observers - βœ… Clean handler interfaces - ⚠️ Slightly more complex setup (factory handles this)

Decision 6: Each Cache Layer = Separate Handler

Context: 4 caching layers with different strategies and speeds.

Decision: Each cache layer gets its own handler in the chain.

Handler Order: 1. EnhancedMatchCacheHandler (24h channel cache - fastest) 2. EventDetailsCacheHandler (team/date/time lookup) 3. LocalDatabaseHandler (bulk events Β±3 days) 4. RegexMatcherHandler (pattern matching) 5. CrossProviderCacheHandler (shared across providers) 6. APIHandler (live API calls - slowest) 7. FallbackHandler (always succeeds)

Rationale: - Each cache is independent handler - Easy to add/remove/reorder cache layers - Each handler testable in isolation - Clear cache hierarchy - Natural fit for Chain of Responsibility

Consequences: - βœ… Flexible cache layer configuration - βœ… Independent testing per cache layer - βœ… Easy to measure cache hit rates per layer - ⚠️ More handlers (7 vs inline checks)

Decision 7: Factory Function for Dependency Injection

Context: Complex dependency graph with 10+ dependencies.

Decision: Use factory function (create_enrichment_pipeline()) as composition root.

Rationale: - Single place for all dependency wiring - Optional dependencies (can omit caches, API clients) - Clear dependency graph - Testable (can inject mocks) - Standard pattern (composition root)

Consequences: - βœ… Clear dependency management - βœ… Flexible configuration - βœ… Easy to test (inject mocks) - ⚠️ One more file (factory.py)


Architecture Overview

High-Level Structure

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EnrichmentPipeline                       β”‚
β”‚  (Orchestration: prepares context, runs chain, notifies)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                β”‚                β”‚
    β–Ό                β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Servicesβ”‚    β”‚ Handlers β”‚    β”‚Observers β”‚
β”‚ (Layer 3)   β”‚ (Layer 2)β”‚    β”‚ (Cross-  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ cutting) β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer 1: Pipeline Orchestration

EnrichmentPipeline (pipeline.py) - Coordinates handler chain - Prepares context with parsed data (using services) - Notifies observers at key points - Post-processes successful matches (update caches, learn patterns) - Size: ~150 lines

EPGEnricher (Facade) - Optional for gradual migration - Backward-compatible wrapper - Delegates to EnrichmentPipeline - Size: ~50 lines

Layer 2: Handler Chain (7 Handlers)

Each handler implements EnrichmentHandler ABC with _try_enrich(context) method.

Handler Priority Order (fastest β†’ slowest):

  1. EnhancedMatchCacheHandler (~50 lines)
  2. 24-48h TTL, channel-level cache
  3. 95%+ hit rate for same-day reprocessing
  4. Dependencies: EnhancedMatchCache, EventDetailsCache, EventEnrichmentBuilder

  5. EventDetailsCacheHandler (~60 lines)

  6. Team/date/time lookup
  7. Cross-provider event detail storage
  8. Dependencies: EventDetailsCache, EventEnrichmentBuilder

  9. LocalDatabaseHandler (~80 lines)

  10. Bulk prefetched events
  11. Β±3 day search window, 70% similarity threshold
  12. Dependencies: EventDatabase, EventDetailsCache, EventEnrichmentBuilder

  13. RegexMatcherHandler (~70 lines)

  14. Pattern-based matching
  15. High confidence (90%+) skips API calls
  16. Dependencies: MultiStageRegexMatcher, EventEnrichmentBuilder

  17. CrossProviderCacheHandler (~60 lines)

  18. Shared cache across providers
  19. Singleton pattern for reuse
  20. Dependencies: CrossProviderEventCache, EventEnrichmentBuilder

  21. APIHandler (~150 lines)

  22. Live API calls (TheSportsDB + ESPN fallback)
  23. Tries each candidate league in order
  24. Dependencies: TheSportsDBClient, ESPNAPIClient, EventEnrichmentBuilder

  25. FallbackHandler (~30 lines)

  26. Builds unmatched enrichment
  27. Always succeeds (end of chain)
  28. No dependencies

Handler Base Class (handlers/base.py, ~50 lines):

class EnrichmentHandler(ABC):
    def handle(self, context: EnrichmentContext) -> EnrichmentContext:
        # Error handling + chain logic

    @abstractmethod
    def _try_enrich(self, context: EnrichmentContext) -> EnrichmentContext:
        # Handler-specific logic

Layer 3: Support Services (6 Services)

Each service is a focused class with single responsibility:

  1. TeamParser (backend/epgoat/services/team_parser.py, ~200 lines)
  2. parse_teams(payload): Extract teams using various separators
  3. clean_team_name(team): Remove ranks, times, noise
  4. canonicalize_teams(team1, team2): Resolve aliases to canonical names
  5. Dependencies: TeamAliasIndex (optional)

  6. LeagueInferencer (backend/epgoat/services/league_inferencer.py, ~250 lines)

  7. infer_league(channel_name, payload, family, teams, team_ids): Multi-strategy inference
  8. extract_league_token(text): Explicit token detection
  9. infer_from_team_ids(team_ids): Prefix-based inference
  10. infer_from_team_database(teams): Database lookup
  11. Dependencies: None

  12. SportTypeDetector (backend/epgoat/services/sport_detector.py, ~100 lines)

  13. detect_sport_type(channel_name, payload): Sport detection from keywords
  14. get_sport_emoji(sport_type): Map sport to emoji
  15. parse_sport_from_title(title): Extract sport prefix (e.g., "Ice Hockey (W)")
  16. Dependencies: None

  17. FloSportsMapper (backend/epgoat/services/flo_mapper.py, ~80 lines)

  18. extract_subcategory(payload): Extract FloSports subcategory (e.g., "flohockey")
  19. map_to_league(subcategory): Map to actual league (e.g., "flohockey" β†’ "NHL")
  20. Dependencies: None

  21. TimeExtractor (backend/epgoat/services/time_extractor.py, ~120 lines)

  22. parse_event_time(event, parsed_time, timezone): Extract start/end times
  23. is_tba(event, payload): Detect TBA/TBD
  24. get_sport_duration(sport_family): Default duration by sport
  25. Dependencies: None

  26. EventEnrichmentBuilder (backend/epgoat/services/enrichment_builder.py, ~180 lines)

  27. build_from_event(event, context, api_source): Build complete enrichment dict
  28. build_description(event, family): Rich description from event data
  29. get_event_logos(event): Extract logo URLs
  30. Dependencies: TimeExtractor, SportTypeDetector

Cross-Cutting: Observers

Observer Base (observers/base.py, ~20 lines):

class EnrichmentObserver(ABC):
    @abstractmethod
    def notify(self, event: str, context: EnrichmentContext) -> None:
        pass

Implementations: - MatchDebugObserver (~60 lines): Structured diagnostics logging - CostTrackingObserver (optional): API cost tracking - FamilyStatsObserver (optional): Pattern learning

Data Flow

Input (channel_name, family, payload, parsed_time, target_date, timezone)
  β”‚
  β–Ό
[EnrichmentPipeline.enrich()]
  β”‚
  β”œβ”€ 1. Create EnrichmentContext
  β”œβ”€ 2. Notify observers ('started')
  β”œβ”€ 3. Prepare context (parse teams, infer league, detect sport)
  β”‚     β”‚
  β”‚     β”œβ”€ TeamParser.parse_teams()
  β”‚     β”œβ”€ TeamParser.canonicalize_teams()
  β”‚     β”œβ”€ LeagueInferencer.infer_league()
  β”‚     └─ SportTypeDetector.detect_sport_type()
  β”‚
  β”œβ”€ 4. Run handler chain
  β”‚     β”‚
  β”‚     β”œβ”€ EnhancedMatchCacheHandler β†’ [miss]
  β”‚     β”œβ”€ EventDetailsCacheHandler β†’ [miss]
  β”‚     β”œβ”€ LocalDatabaseHandler β†’ [miss]
  β”‚     β”œβ”€ RegexMatcherHandler β†’ [miss]
  β”‚     β”œβ”€ CrossProviderCacheHandler β†’ [miss]
  β”‚     β”œβ”€ APIHandler β†’ [HIT!]
  β”‚     β”‚   β”‚
  β”‚     β”‚   └─ EventEnrichmentBuilder.build_from_event()
  β”‚     β”‚
  β”‚     └─ FallbackHandler (if all miss)
  β”‚
  β”œβ”€ 5. Post-process (update caches, learn patterns)
  β”‚     β”‚
  β”‚     β”œβ”€ CrossProviderCache.store_event()
  β”‚     β”œβ”€ EnhancedMatchCache.store_match()
  β”‚     β”œβ”€ FamilyStatsTracker.learn_match()
  β”‚     └─ CostTracker.track_family_match()
  β”‚
  └─ 6. Notify observers ('completed')
  β”‚
  β–Ό
Output (enrichment dict)

Component Specifications

EnrichmentContext (Data Transfer Object)

Location: enrichment/context.py

@dataclass
class EnrichmentContext:
    """Context object passed through enrichment pipeline."""

    # INPUT (provided by caller)
    channel_name: str
    family: str
    payload: str
    parsed_time: Optional[datetime]
    target_date: date
    target_timezone: ZoneInfo

    # PARSED DATA (populated by services)
    normalized_channel_name: str = ""
    normalized_payload: str = ""
    team1: Optional[str] = None
    team2: Optional[str] = None
    team_ids: tuple[str, ...] = field(default_factory=tuple)
    candidate_leagues: list[str] = field(default_factory=list)
    inferred_league: Optional[str] = None
    sport_type: Optional[str] = None
    sport_emoji: Optional[str] = None

    # RESULT (populated by handlers)
    enrichment: Optional[dict[str, Any]] = None
    matched: bool = False
    match_source: Optional[str] = None

    # METADATA (for debugging)
    handler_attempts: list[str] = field(default_factory=list)
    errors: list[str] = field(default_factory=list)

EnrichmentHandler (Base Class)

Location: enrichment/handlers/base.py

class EnrichmentHandler(ABC):
    """Base class for enrichment handlers (Chain of Responsibility)."""

    def __init__(self, next_handler: Optional['EnrichmentHandler'] = None):
        self._next_handler = next_handler

    def handle(self, context: EnrichmentContext) -> EnrichmentContext:
        """
        Attempt to enrich. If successful, return context with matched=True.
        If unsuccessful, pass to next handler in chain.
        """
        try:
            context.handler_attempts.append(self.__class__.__name__)
            result = self._try_enrich(context)

            if result.matched:
                result.match_source = self.__class__.__name__
                return result  # Stop chain

        except Exception as e:
            logger.warning(f"{self.__class__.__name__} failed: {e}")
            context.errors.append(f"{self.__class__.__name__}: {e}")

        # Continue chain
        if self._next_handler:
            return self._next_handler.handle(context)

        return context

    @abstractmethod
    def _try_enrich(self, context: EnrichmentContext) -> EnrichmentContext:
        """Handler-specific enrichment logic."""
        pass

Factory Function

Location: enrichment/factory.py

def create_enrichment_pipeline(
    # API clients
    thesportsdb_client: Optional[TheSportsDBClient] = None,
    espn_client: Optional[ESPNAPIClient] = None,

    # Caches
    enhanced_cache: Optional[EnhancedMatchCache] = None,
    event_details_cache: Optional[EventDetailsCache] = None,
    event_database: Optional[EventDatabase] = None,
    cross_provider_cache: Optional[CrossProviderEventCache] = None,

    # Analytics
    match_debug_logger: Optional[MatchDebugLogger] = None,
    cost_tracker: Optional[CostTracker] = None,
    family_stats_tracker: Optional[FamilyStatsTracker] = None,

    # Other
    team_alias_index: Optional[TeamAliasIndex] = None,
    regex_matcher: Optional[MultiStageRegexMatcher] = None,
) -> EnrichmentPipeline:
    """
    Factory function to wire up the entire enrichment pipeline.

    This is the composition root - all dependency injection happens here.
    """
    # Create services
    team_parser = TeamParser(team_alias_index=team_alias_index)
    league_inferencer = LeagueInferencer()
    sport_detector = SportTypeDetector()
    flo_mapper = FloSportsMapper()
    enrichment_builder = EventEnrichmentBuilder(
        time_extractor=TimeExtractor(),
        sport_detector=sport_detector,
    )

    # Create handlers (conditionally based on what's provided)
    handlers = []
    if enhanced_cache:
        handlers.append(EnhancedMatchCacheHandler(...))
    if event_details_cache:
        handlers.append(EventDetailsCacheHandler(...))
    # ... etc

    # Always add fallback
    handlers.append(FallbackHandler())

    # Create observers
    observers = []
    if match_debug_logger:
        observers.append(MatchDebugObserver(match_debug_logger))

    return EnrichmentPipeline(
        team_parser=team_parser,
        league_inferencer=league_inferencer,
        sport_detector=sport_detector,
        flo_mapper=flo_mapper,
        handlers=handlers,
        observers=observers,
        cross_provider_cache=cross_provider_cache,
        enhanced_match_cache=enhanced_cache,
        family_stats_tracker=family_stats_tracker,
        cost_tracker=cost_tracker,
    )

File Organization

New Directory Structure

backend/epgoat/services/enrichment/
β”œβ”€β”€ __init__.py                      # Public API exports
β”œβ”€β”€ context.py                       # EnrichmentContext (50 lines)
β”œβ”€β”€ pipeline.py                      # EnrichmentPipeline (150 lines)
β”œβ”€β”€ factory.py                       # create_enrichment_pipeline (100 lines)
β”‚
β”œβ”€β”€ handlers/
β”‚   β”œβ”€β”€ __init__.py                  # Handler exports
β”‚   β”œβ”€β”€ base.py                      # EnrichmentHandler ABC (50 lines)
β”‚   β”œβ”€β”€ cache_handlers.py            # 3 cache handlers (200 lines)
β”‚   β”œβ”€β”€ database_handler.py          # LocalDatabaseHandler (80 lines)
β”‚   β”œβ”€β”€ regex_handler.py             # RegexMatcherHandler (70 lines)
β”‚   β”œβ”€β”€ api_handler.py               # APIHandler (150 lines)
β”‚   └── fallback_handler.py          # FallbackHandler (30 lines)
β”‚
β”œβ”€β”€ backend/epgoat/services/
β”‚   β”œβ”€β”€ __init__.py                  # Service exports
β”‚   β”œβ”€β”€ team_parser.py               # TeamParser (200 lines)
β”‚   β”œβ”€β”€ league_inferencer.py         # LeagueInferencer (250 lines)
β”‚   β”œβ”€β”€ sport_detector.py            # SportTypeDetector (100 lines)
β”‚   β”œβ”€β”€ flo_mapper.py                # FloSportsMapper (80 lines)
β”‚   β”œβ”€β”€ time_extractor.py            # TimeExtractor (120 lines)
β”‚   └── enrichment_builder.py        # EventEnrichmentBuilder (180 lines)
β”‚
└── observers/
    β”œβ”€β”€ __init__.py                  # Observer exports
    β”œβ”€β”€ base.py                      # EnrichmentObserver ABC (20 lines)
    └── debug_observer.py            # MatchDebugObserver (60 lines)

TOTAL: ~1,900 lines across 14 focused files (vs 2,068 lines in 1 file)

Public API

# backend/epgoat/services/enrichment/__init__.py

from .context import EnrichmentContext
from .pipeline import EnrichmentPipeline
from .factory import create_enrichment_pipeline

# Handler exports (for testing/advanced usage)
from .handlers import (
    EnrichmentHandler,
    EnhancedMatchCacheHandler,
    EventDetailsCacheHandler,
    LocalDatabaseHandler,
    RegexMatcherHandler,
    CrossProviderCacheHandler,
    APIHandler,
    FallbackHandler,
)

# Service exports (for reuse)
from .services import (
    TeamParser,
    LeagueInferencer,
    SportTypeDetector,
    FloSportsMapper,
    TimeExtractor,
    EventEnrichmentBuilder,
)

# Observer exports
from .observers import EnrichmentObserver, MatchDebugObserver

__all__ = [
    "EnrichmentContext",
    "EnrichmentPipeline",
    "create_enrichment_pipeline",
    # Handlers
    "EnrichmentHandler",
    "EnhancedMatchCacheHandler",
    "EventDetailsCacheHandler",
    "LocalDatabaseHandler",
    "RegexMatcherHandler",
    "CrossProviderCacheHandler",
    "APIHandler",
    "FallbackHandler",
    # Services
    "TeamParser",
    "LeagueInferencer",
    "SportTypeDetector",
    "FloSportsMapper",
    "TimeExtractor",
    "EventEnrichmentBuilder",
    # Observers
    "EnrichmentObserver",
    "MatchDebugObserver",
]

Migration Strategy

Phase 1: Create New Implementation (Parallel)

Timeline: Sprint 1-2

Create new enrichment/ package alongside existing backend/epgoat/services/api_enrichment.py:

  1. Week 1: Foundation
  2. Create context.py (EnrichmentContext)
  3. Create handlers/base.py (EnrichmentHandler ABC)
  4. Create observers/base.py (EnrichmentObserver ABC)
  5. Write unit tests for base classes

  6. Week 2: Services

  7. Implement 6 support services (team_parser, league_inferencer, etc.)
  8. Extract logic from existing EPGEnricher methods
  9. Write unit tests for each service (aim for 90%+ coverage)

  10. Week 3: Handlers

  11. Implement 7 handlers
  12. Wire up dependencies
  13. Write unit tests for each handler

  14. Week 4: Pipeline & Factory

  15. Implement EnrichmentPipeline
  16. Implement create_enrichment_pipeline factory
  17. Write integration tests

Validation: New implementation passes all unit tests, integration tests run successfully.

Phase 2: Update Callers

Timeline: Sprint 3

Find all callers of EPGEnricher and update to use new API:

# OLD way (api_enrichment.py):
enricher = EPGEnricher(
    api_key=api_key,
    enable_api=True,
    use_espn_fallback=True,
    failure_tracker=failure_tracker,
    api_cache=api_cache,
    event_database=event_database,
    mismatch_tracker=mismatch_tracker,
    event_details_cache=event_details_cache,
    match_debug_logger=match_debug_logger,
    team_alias_index=team_alias_index,
)

result = enricher.enrich_event(
    channel_name=channel_name,
    family=family,
    payload=payload,
    parsed_time=parsed_time,
    target_date=target_date,
    target_timezone=target_timezone,
)

# NEW way (enrichment/):
pipeline = create_enrichment_pipeline(
    thesportsdb_client=TheSportsDBClient(api_key=api_key),
    espn_client=ESPNAPIClient(),
    enhanced_cache=EnhancedMatchCache(),
    event_details_cache=event_details_cache,
    event_database=event_database,
    cross_provider_cache=CrossProviderEventCache(),
    match_debug_logger=match_debug_logger,
    cost_tracker=CostTracker(api_cost_per_call=0.004),
    family_stats_tracker=FamilyStatsTracker(),
    team_alias_index=team_alias_index,
    regex_matcher=MultiStageRegexMatcher(),
)

result = pipeline.enrich(
    channel_name=channel_name,
    family=family,
    payload=payload,
    parsed_time=parsed_time,
    target_date=target_date,
    target_timezone=target_timezone,
)

Steps: 1. Find all EPGEnricher instantiations (grep, IDE search) 2. Update each caller to use factory function 3. Update imports 4. Run tests after each update 5. Commit after each file migrated

Expected Callers: - backend/epgoat/application/epg_generator.py (main CLI) - Integration tests - Any other enrichment workflows

Phase 3: Remove Old Code

Timeline: Sprint 4

After all callers migrated and verified:

  1. Delete backend/epgoat/services/api_enrichment.py
  2. Update imports across codebase
  3. Update documentation references
  4. Run full test suite
  5. Create PR with summary of changes

Validation: All tests pass, no references to old API remain.


Testing Strategy

Unit Tests (Component Isolation)

Services (6 test files):

test_team_parser.py:
  - test_parse_teams_with_vs_separator
  - test_parse_teams_with_at_separator
  - test_parse_teams_with_unicode_separators
  - test_clean_team_name_removes_rank
  - test_clean_team_name_removes_time_patterns
  - test_canonicalize_teams_resolves_aliases
  - test_canonicalize_teams_skips_college_sports

test_league_inferencer.py:
  - test_infer_from_explicit_token
  - test_infer_from_team_ids
  - test_infer_from_database_both_teams
  - test_infer_from_database_single_team
  - test_infer_from_alias_pairing
  - test_candidate_league_deduplication
  - test_flosports_mapping

test_sport_detector.py:
  - test_detect_from_title_prefix
  - test_detect_from_keywords
  - test_get_sport_emoji
  - test_flosports_subcategory_detection

test_flo_mapper.py:
  - test_extract_subcategory_with_colon
  - test_extract_subcategory_without_colon
  - test_map_to_league
  - test_unmapped_subcategory_returns_none

test_time_extractor.py:
  - test_parse_api_time_utc_to_target_timezone
  - test_parse_api_time_fallback_to_parsed_time
  - test_is_tba_no_time
  - test_is_tba_status_indicators
  - test_get_sport_duration_by_family

test_enrichment_builder.py:
  - test_build_from_event
  - test_build_description
  - test_get_event_logos
  - test_get_event_times

Handlers (3 test files):

test_cache_handlers.py:
  - test_enhanced_cache_hit
  - test_enhanced_cache_miss_continues_chain
  - test_details_cache_finds_by_teams_and_date
  - test_details_cache_miss_continues_chain
  - test_cross_provider_cache_hit
  - test_cross_provider_cache_miss

test_database_handler.py:
  - test_local_db_hit
  - test_local_db_miss_continues_chain
  - test_local_db_uses_search_window
  - test_local_db_uses_similarity_threshold

test_api_handler.py:
  - test_thesportsdb_success
  - test_espn_fallback_on_tsdb_failure
  - test_tries_all_candidate_leagues
  - test_stops_on_first_match
  - test_updates_caches_on_match

Pipeline (1 test file):

test_enrichment_pipeline.py:
  - test_enhanced_cache_hit_skips_later_handlers
  - test_fallback_through_all_handlers
  - test_api_match_updates_caches
  - test_observers_notified_on_start_and_complete
  - test_context_accumulates_parsed_data
  - test_error_in_handler_continues_chain
  - test_post_process_updates_all_caches
  - test_family_stats_learning_on_match

Integration Tests

test_integration_enrichment.py:
  - test_full_pipeline_with_all_handlers
  - test_full_pipeline_api_match
  - test_full_pipeline_cache_hit
  - test_full_pipeline_regex_match
  - test_full_pipeline_fallback
  - test_cost_tracking_integration
  - test_debug_logging_integration

Coverage Target

  • Unit Tests: 85%+ coverage per module
  • Integration Tests: 90%+ coverage of pipeline orchestration
  • Overall Target: 85%+ coverage

Test Isolation

Each test file should: - Mock external dependencies (API clients, caches, databases) - Test one component in isolation - Run in <100ms per test - Be independent (no shared state)


Benefits Summary

Code Quality Improvements

Metric Before After Improvement
File Count 1 monolith 14 focused files +13 files (better organization)
Total Lines 2,068 ~1,900 -168 lines (8% reduction)
Max File Size 2,068 lines 300 lines 7x reduction
Max Function Size 800 lines 50 lines 16x reduction
Responsibilities per File 10+ 1 100% SRP compliance
Cyclomatic Complexity >20 <10 >50% reduction
Nesting Depth 6+ levels 2-3 levels 50% reduction

SOLID Compliance

Before: - ❌ Single Responsibility: 10+ responsibilities - ❌ Open/Closed: Modification required for new strategies - ⚠️ Liskov Substitution: N/A - ⚠️ Interface Segregation: N/A - ❌ Dependency Inversion: Dependencies hardcoded

After: - βœ… Single Responsibility: 1 per file - βœ… Open/Closed: Extend via new handlers - βœ… Liskov Substitution: All handlers interchangeable - βœ… Interface Segregation: Focused interfaces (Handler, Observer) - βœ… Dependency Inversion: Factory injection

Design Patterns Applied

  • βœ… Chain of Responsibility: Handler chain
  • βœ… Observer: Analytics decoupling
  • βœ… Strategy: Interchangeable handlers
  • βœ… Factory: Dependency injection
  • βœ… Repository: Data access (existing)
  • βœ… Service Layer: Business logic (existing)

Maintainability Improvements

Before: - ❌ Hard to add new enrichment strategies (800-line function) - ❌ Hard to reorder priorities (deeply nested logic) - ❌ Hard to test (tightly coupled) - ❌ Hard to debug (complex state)

After: - βœ… Easy to add new strategies (new handler class) - βœ… Easy to reorder priorities (reorder handler list) - βœ… Easy to test (isolated components) - βœ… Easy to debug (clear context progression)

Performance Implications

No Performance Regression: - Same cache hierarchy (same hit rates) - Same API call patterns - Same regex matching - Negligible object creation overhead (<1ms)

Potential Improvements: - Better cache layer visibility (measure each layer) - Easier to add new optimization strategies - Clearer performance bottlenecks


Risks and Mitigations

Risk 1: Breaking Changes Impact Callers

Risk: Callers need to update to new API, could cause temporary breakage.

Mitigation: - Parallel implementation (no deletion of old code until migration complete) - Clear migration guide with examples - Update callers one at a time, test after each - Keep old code until all callers migrated

Risk 2: Increased Complexity (More Files)

Risk: 14 files vs 1 file could be harder to navigate.

Mitigation: - Clear naming conventions - Comprehensive README.md in enrichment/ package - Public API exports in __init__.py - Strong documentation per module

Risk 3: Testing Overhead

Risk: More components = more tests to write and maintain.

Mitigation: - Isolated testing is actually easier (mock dependencies) - Reuse test fixtures across similar tests - Focus on unit tests first (faster, easier) - Integration tests for end-to-end validation

Risk 4: Migration Bugs

Risk: Logic might be lost or changed during refactoring.

Mitigation: - Extract methods first, refactor second - Run existing tests after each extraction - Add new tests alongside extraction - Manual testing of key workflows


Success Criteria

Implementation Complete When:

  • [ ] All 14 new files created and tested
  • [ ] Unit test coverage >85% per module
  • [ ] Integration tests pass
  • [ ] All existing tests still pass
  • [ ] Documentation complete (READMEs, docstrings)
  • [ ] Code review approved

Migration Complete When:

  • [ ] All callers updated to new API
  • [ ] Old backend/epgoat/services/api_enrichment.py deleted
  • [ ] All tests pass
  • [ ] No references to old API remain
  • [ ] Performance benchmarks show no regression

Success Metrics:

  • [ ] File size <300 lines per file
  • [ ] Function size <50 lines per function
  • [ ] 100% SOLID compliance
  • [ ] Test coverage >85%
  • [ ] No mypy errors
  • [ ] No Ruff violations
  • [ ] CI pipeline passes

References

  • Phase 2 Code Review (2025-11-03)
  • api_enrichment.py 7-point inspection findings

Implementation Plan

Once design is approved, use superpowers:writing-plans to create detailed implementation plan with: - Exact file locations - Complete code examples - Step-by-step verification - Zero-context engineer handoff


Design Status: βœ… Approved - Ready for Implementation Planning